# Oversight-Detection Dataset

A dataset for evaluating LLMs' capability to reason about human overseer, including AutoGPT, Mini-AGI. The evaluation framwork is built based on Intercode. The agent scaffolds are at `img_data/agent`, the queries for the agent-manipulation setup are at `data/agent/agent.json`, and the queries for the agent-qa setup are at `data/agent/qa.json`.

## Set up the Environment
You can install InterCode as a PyPI package or by building from source.

> **Note**
> InterCode requires the following installations to run:
> * `python` >= 3.8
> * `docker`: Learn more [here](https://docs.docker.com/get-docker/) to install. Before running the below code, make sure the Docker daemon/application is running locally.

### 🐍 PyPI Package
1. Install the ([pypi package](https://pypi.org/project/intercode-bench/)):
```bash
pip install intercode-bench
```

### 💽 Build from Source
1. Clone this repository, create a virtual environment, and install necessary dependencies
```bash
git clone https://github.com/princeton-nlp/intercode.git
cd intercode
conda env create -f environment.yml
conda activate intercode
```
2. Run `setup.sh` to create the docker images for the InterCode Bash, CTF, Python, and SQL environments
3. Run `python run_demo.py sql`

If InterCode was installed successfully, the InterCode SQL environment should be started successfully and a CLI interpreter should appear, allowing you to enter `SQL` commands to interact with the task environment.
You can `ctrl + c` at any to time to exit the environment.
Check [`run_demo.py`](https://github.com/princeton-nlp/intercode/blob/master/run_demo.py#L21) for the latest full list of available environments.

### 🧪 Run Experiments
First, make sure you have at least one of the following keys declared
1. As an environment variable, or
2. Specified in a `keys.cfg` file formatted as follows + located in the root of this repository:
```
OPENAI_API_KEY: 'key here'
PALM_API_KEY: 'key here'
```

To run evaluation under the agent-qa setup, use
```bash
bash scripts/expr_qa.sh
```
To run evaluation under the agent-manipulation setup, use
```bash
bash scripts/expr_agent.sh
```
The logs will be stored at `logs/experiments`.